Efficient Phrase-Table Representation for Machine Translation with Applications to Online MT and Speech Translation

نویسندگان

  • Richard Zens
  • Hermann Ney
چکیده

In phrase-based statistical machine translation, the phrase-table requires a large amount of memory. We will present an efficient representation with two key properties: on-demand loading and a prefix tree structure for the source phrases. We will show that this representation scales well to large data tasks and that we are able to store hundreds of millions of phrase pairs in the phrase-table. For the large Chinese– English NIST task, the memory requirements of the phrase-table are reduced to less than 20MB using the new representation with no loss in translation quality and speed. Additionally, the new representation is not limited to a specific test set, which is important for online or real-time machine translation. One problem in speech translation is the matching of phrases in the input word graph and the phrase-table. We will describe a novel algorithm that effectively solves this combinatorial problem exploiting the prefix tree data structure of the phrase-table. This algorithm enables the use of significantly larger input word graphs in a more efficient way resulting in improved translation quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Multi-Engine Machine Translation and Online Learning through Dynamic Phrase Tables

Extending phrase-based Statistical Machine Translation systems with a second, dynamic phrase table has been done for multiple purposes. Promising results have been reported for hybrid or multi-engine machine translation, i.e.\ building a phrase table from the knowledge of external MT systems, and for online learning. We argue that, in prior research, dynamic phrase tables are not scored optimal...

متن کامل

Hierarchical Phrase-Based MT for Phonetic Representation-Based Speech Translation

The paper presents a novel technique for speech translation using hierarchical phrasedbased statistical machine translation (HPBSMT). The system is based on translation of speech from phone sequences as opposed to conventional approach of speech translation from word sequences. The technique facilitates speech translation by allowing a machine translation (MT) system to access to phonetic infor...

متن کامل

Dynamic Models in Moses for Online Adaptation

Avery hot issue for research and industry is how to effectively integratemachine translation (MT)within computer assisted translation (CAT) software. This paper focuses on this issue, and more generally how to dynamically adapt phrase-based statistical machine translation (SMT) by exploiting external knowledge, like the post-editions from professional translators. We present an enhancement of t...

متن کامل

Phrase Based Language Model For Statistical Machine Translation

We consider phrase based Language Models (LM), which generalize the commonly used word level models. Similar concept on phrase based LMs appears in speech recognition, which is rather specialized and thus less suitable for machine translation (MT). In contrast to the dependency LM, we first introduce the exhaustive phrase-based LMs tailored for MT use. Preliminary experimental results show that...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007